Design of High-Speed, Low-Power Sensing Circuits for Nano-Scale Embedded Memory

This paper comparatively reviews sensing circuit designs for the most widely used embedded memory, static random-access memory (SRAM). Many sensing circuits for SRAM have been proposed to improve power efficiency and speed, because sensing operations in SRAM dominantly determine the overall speed and power consumption of the system-on-chip. This phenomenon is more pronounced in the nanoscale era, where SRAM bit-cells implemented near minimum-sized transistors are highly influenced by variation effects. Under this condition, for stable sensing, the control signal for accessing the selected bit-cell (word-line, WL) should be asserted for a long time, leading to increases in the power dissipation and delay at the same time. By innovating sensing circuits that can reduce the WL pulse width, the sensing power and speed can be efficiently improved, simultaneously. Throughout this paper, the strength and weakness of many SRAM sensing circuits are introduced in terms of various aspects—speed, area, power, etc.


Introduction
System-on-chip design encounters considerable challenges related to power consumption and latency, with an influence emanating from static random-access memory (SRAM) [1][2][3][4].Thus, the efficient management of SRAM power consumption and the enhancement of SRAM access speed becomes highly important.Although reducing the supply voltage (V DD ) proves effective in reducing power consumption, it introduces potential performance and stability trade-offs.In particular, the SRAM bit-cell, a circuit component for binary data storage, is typically constructed with near minimum-sized transistors to achieve high-density integration, resulting in significant performance variability due to process deviations [5][6][7][8].Furthermore, to address read stability issues, read assist circuits are employed to suppress the word-line voltage, which can exacerbate performance degradation.Consequently, the optimization of SRAM circuits to minimize both power consumption and delay becomes crucial.
By analyzing the read operation, we can identify a method to simultaneously reduce power consumption and delay in SRAM.During the read operation, the bit-cell generates a voltage difference across the bit-line pair.Then, a sensing circuit measures this voltage difference and subsequently delivers the results to the external system.Importantly, the bit-line pair, which plays a fundamental role, has a significant capacitance, enough to make it the dominant contributor to both delay and power consumption during the read operation.Consequently, when a substantial voltage swing in the bit-line is necessitated for the read operation, it inevitably results in increased delays and power consumption.Thus, reducing the bit-line swing during the read operation can effectively decrease the power consumption and delay at the same time [9][10][11].
However, it is highly challenging to reduce bit-line voltage swing.This is because sensing circuits, especially the sense amplifier (SA) responsible for detecting bit-line swing, Sensors 2024, 24, 16 2 of 24 necessitate a sufficiently large bit-line voltage difference (∆V BL ) for precise operation.This need arises due to transistor mismatch within the SA, causing asymmetry in its characteristics.The minimum input voltage difference (in this case, ∆V BL ) required for stable SA operation is known as the SA offset voltage (V OS ).To reduce the ∆V BL , it becomes essential to lower the V OS .
Consequently, there are numerous prior research efforts proposed to reduce the V OS , the most important performance of SAs.The simplest method is to use larger width transistors for SAs, which can reduce the mismatch between paired transistors.However, this approach incurs area and power overhead.To reduce the V OS while minimizing the area and power overhead, various offset reducing circuit techniques have been proposed .This paper aims to conduct a comparative analysis of these circuits, explaining their effectiveness in reducing the V OS and achieving power and performance benefits.
The rest of this paper is organized as follows: Section 2 provides essential background information on SRAM read operations and conventional SRAM sensing circuits, including an examination of their limitations.This foundation is crucial for understanding the subsequent content.Section 3 delves into comprehensive introductions of various previously researched SRAM sensing circuits designed to reduce the V OS , ultimately enhancing speed and power efficiency.Section 4 details a comparative analysis and discussion of the SRAM sensing circuits introduced in Section 3 from various perspectives.

Backgrounds on SRAM Read Operation and Conventional Sensing Circuits
Figure 1 presents the simplified circuits in the conventional SRAM for the read operation.In the following, we provide brief explanations for the structure and operation of each circuit shown in Figure 1.
Sensors 2024, 24, x FOR PEER REVIEW 2 of 24 However, it is highly challenging to reduce bit-line voltage swing.This is because sensing circuits, especially the sense amplifier (SA) responsible for detecting bit-line swing, necessitate a sufficiently large bit-line voltage difference (ΔVBL) for precise operation.This need arises due to transistor mismatch within the SA, causing asymmetry in its characteristics.The minimum input voltage difference (in this case, ΔVBL) required for stable SA operation is known as the SA offset voltage (VOS).To reduce the ΔVBL, it becomes essential to lower the VOS.
Consequently, there are numerous prior research efforts proposed to reduce the VOS, the most important performance of SAs.The simplest method is to use larger width transistors for SAs, which can reduce the mismatch between paired transistors.However, this approach incurs area and power overhead.To reduce the VOS while minimizing the area and power overhead, various offset reducing circuit techniques have been proposed .This paper aims to conduct a comparative analysis of these circuits, explaining their effectiveness in reducing the VOS and achieving power and performance benefits.
The rest of this paper is organized as follows: Section 2 provides essential background information on SRAM read operations and conventional SRAM sensing circuits, including an examination of their limitations.This foundation is crucial for understanding the subsequent content.Section 3 delves into comprehensive introductions of various previously researched SRAM sensing circuits designed to reduce the VOS, ultimately enhancing speed and power efficiency.Section 4 details a comparative analysis and discussion of the SRAM sensing circuits introduced in Section 3 from various perspectives.

Backgrounds on SRAM Read Operation and Conventional Sensing Circuits
Figure 1 presents the simplified circuits in the conventional SRAM for the read operation.In the following, we provide brief explanations for the structure and operation of each circuit shown in Figure 1.At the top of Figure 1, the bit-cell is composed of six transistors.In this 6T bit-cell, two cross-coupled inverters are formed of M 1 , M 2 , M 3 , and M 4 for storing and latching the binary data at two storage nodes, Q T and Q C .The two access transistors, M 5 and M 6 , Sensors 2024, 24,16 3 of 24 serve as control elements that regulate connections between the bit-line pair (BL T and BL C ) and storage nodes (Q T and Q C ).When the WL activates (i.e., WL = 1), access transistors are turned on to connect bit-lines to storage nodes.
Next, the bit-line pre-charge circuit is shown, which is formed of M PCT , M PCC, and M EQ .These transistors are controlled by the low-enable pre-charge trigger signal, PCB, with their gates connected.When PCB = 0, M PCT and M PCC are turned on to pre-charge BL T and BL C up to V DD , while M EQ ensures that BL T and BL C are pre-charged to equal voltages.
The column multiplexer (MUX) implemented with M C1 , M C2 , . .., M C8 selects one bit-line pair from multiple pairs (four pairs in Figure 1) and connects it to the SA input pair SL T and SL C .The specific bit-line pair to be connected is determined by the column address signal, COLB[0:3], with only one of these signals set to low.
The SA plays a key role in the SRAM read operation.It amplifies the voltage difference between SL T and SL C , converting it into a full-logic swing voltage.This amplified signal is then made available at the SA's differential outputs-SO T and SO C .Two commonly used conventional SA structures are the voltage-type latch SA (VLSA) and the current-type latch SA (CLSA), which are shown in Figure 2a,b, respectively [48].Compared to VLSAs, CLSAs acquire SA input voltages, SL C and SL T , through the gate of access transistors, M S1 and M S2 .Therefore, the SA input voltage drives high impedance and less sensitivity to the timing mismatch.However, CLSAs have additional transistors for sensing operations.Therefore, CLSAs have lower speed performance, higher energy consumption, and a larger area, compared to VLSAs.The SA enable signal (SAE), connected to M S5 -M S7 of VLSA and M S7 -M S9 of CLSA, is utilized for triggering the amplifying operation of the SA.
At the top of Figure 1, the bit-cell is composed of six transistors.In this 6T bit-cell, two cross-coupled inverters are formed of M1, M2, M3, and M4 for storing and latching the binary data at two storage nodes, QT and QC.The two access transistors, M5 and M6, serve as control elements that regulate connections between the bit-line pair (BLT and BLC) and storage nodes (QT and QC).When the WL activates (i.e., WL = 1), access transistors are turned on to connect bit-lines to storage nodes.
Next, the bit-line pre-charge circuit is shown, which is formed of MPCT, MPCC, and MEQ.These transistors are controlled by the low-enable pre-charge trigger signal, PCB, with their gates connected.When PCB = 0, MPCT and MPCC are turned on to pre-charge BLT and BLC up to VDD, while MEQ ensures that BLT and BLC are pre-charged to equal voltages.
The column multiplexer (MUX) implemented with MC1, MC2, … MC8 selects one bitline pair from multiple pairs (four pairs in Figure 1) and connects it to the SA input pair SLT and SLC.The specific bit-line pair to be connected is determined by the column address signal, COLB[0:3], with only one of these signals set to low.
The SA plays a key role in the SRAM read operation.It amplifies the voltage difference between SLT and SLC, converting it into a full-logic swing voltage.This amplified signal is then made available at the SA's differential outputs-SOT and SOC.Two commonly used conventional SA structures are the voltage-type latch SA (VLSA) and the current-type latch SA (CLSA), which are shown in Figure 2a,b, respectively [48].Compared to VLSAs, CLSAs acquire SA input voltages, SLC and SLT, through the gate of access transistors, MS1 and MS2.Therefore, the SA input voltage drives high impedance and less sensitivity to the timing mismatch.However, CLSAs have additional transistors for sensing operations.Therefore, CLSAs have lower speed performance, higher energy consumption, and a larger area, compared to VLSAs.The SA enable signal (SAE), connected to MS5-MS7 of VLSA and MS7-MS9 of CLSA, is utilized for triggering the amplifying operation of the SA. Figure 3 provides operational waveforms of relevant signals during the conventional SRAM read operation, divided into three phases: the pre-charge phase, the access phase, and the evaluation phase.In the pre-charge phase, the PCB becomes low, which precharges the bit-lines (BLT and BLC) and SA inputs (SLT and SLC) to VDD through the bit-line pre-charge circuit and the SA input pre-charge circuit.Then, the access phase starts by making PCB = 1 to turn off the pre-charge circuits, while the WL for the selected bit-cell is asserted to reflect the data at QT and QC onto the bit-line pair of BLT and BLC. Figure 3 shows an example of bit-cell storing datum "1" (QT = 1 and QC = 0).In this example, BLT remains high while BLC falls due to the bit-cell current through M6, creating a voltage difference between BLT and BLC.By lowering the COLB[i] in the selected column, the column MUX transistors transfer only the selected bit-line pair voltage to the SA inputs, SLT and SLC. Figure 3 provides operational waveforms of relevant signals during the conventional SRAM read operation, divided into three phases: the pre-charge phase, the access phase, and the evaluation phase.In the pre-charge phase, the PCB becomes low, which pre-charges the bit-lines (BL T and BL C ) and SA inputs (SL T and SL C ) to V DD through the bit-line precharge circuit and the SA input pre-charge circuit.Then, the access phase starts by making PCB = 1 to turn off the pre-charge circuits, while the WL for the selected bit-cell is asserted to reflect the data at Q T and Q C onto the bit-line pair of BL T and BL C . Figure 3 shows an example of bit-cell storing datum "1" (Q T = 1 and Q C = 0).In this example, BL T remains high while BL C falls due to the bit-cell current through M 6 , creating a voltage difference between BL T and BL C .By lowering the COLB[i] in the selected column, the column MUX transistors transfer only the selected bit-line pair voltage to the SA inputs, SL T and SL C .During the subsequent evaluation phase, the SA enable signal (SAE) becomes high to trigger the positive feedback configuration in the SA.In this manner, a small voltage difference between SLT and SLC, ΔVIN,SA (See Figure 3), is amplified into the digital voltage difference at SA output nodes SOT and SOC.For example, the sensing operation of a VLSA in Figure 2a is shown in Figure 4.When the sensing datum is "1", the SLT remains at VDD while the SLC decreases due to the bit-cell, reaching VDD − ΔVIN,SA, as shown on the left side of Figure 4.The voltages at the SA outputs, SOT and SOC, are equal to those at SLT and SLC, respectively, through the pass transistors MS5 and MS6.During the subsequent evaluation phase, the SAE rises, and current flows through paired nFETs.
The FETs in the SA, MS1 and MS2, are depicted as IS1 and IS2 in the middle of Figure 4.At the beginning of the evaluation phase, the VGS of MS2 (SOT = VDD) is greater than that of MS1 (SOC = VDD − ΔVIN,SA).Consequently, IS2 > IS1 makes SOC fall faster than SOT.This leads During the subsequent evaluation phase, the SA enable signal (SAE) becomes high to trigger the positive feedback configuration in the SA.In this manner, a small voltage difference between SL T and SL C , ∆V IN,SA (See Figure 3), is amplified into the digital voltage difference at SA output nodes SO T and SO C .For example, the sensing operation of a VLSA in Figure 2a is shown in Figure 4.During the subsequent evaluation phase, the SA enable signal (SAE) becomes high to trigger the positive feedback configuration in the SA.In this manner, a small voltage difference between SLT and SLC, ΔVIN,SA (See Figure 3), is amplified into the digital voltage difference at SA output nodes SOT and SOC.For example, the sensing operation of a VLSA in Figure 2a is shown in Figure 4.When the sensing datum is "1", the SLT remains at VDD while the SLC decreases due to the bit-cell, reaching VDD − ΔVIN,SA, as shown on the left side of Figure 4.The voltages at the SA outputs, SOT and SOC, are equal to those at SLT and SLC, respectively, through the pass transistors MS5 and MS6.During the subsequent evaluation phase, the SAE rises, and current flows through paired nFETs.
The FETs in the SA, MS1 and MS2, are depicted as IS1 and IS2 in the middle of Figure 4.At the beginning of the evaluation phase, the VGS of MS2 (SOT = VDD) is greater than that of MS1 (SOC = VDD − ΔVIN,SA).Consequently, IS2 > IS1 makes SOC fall faster than SOT.This leads When the sensing datum is "1", the SL T remains at V DD while the SL C decreases due to the bit-cell, reaching V DD − ∆V IN,SA , as shown on the left side of Figure 4.The voltages at the SA outputs, SO T and SO C , are equal to those at SL T and SL C , respectively, through the pass transistors M S5 and M S6 .During the subsequent evaluation phase, the SAE rises, and current flows through paired nFETs.
The FETs in the SA, M S1 and M S2 , are depicted as I S1 and I S2 in the middle of SO C eventually reach V DD and 0 V, respectively, as shown on the right side of Figure 4, indicating a successful "1" datum sensing process.
However, it is not always guaranteed that the SA operation is stably performed.In Figure 5, there is a scenario where sensing failure occurs.The access phase is the same as the previous normal sensing operation (the left side of Figure 5).However, when the evaluation starts by triggering the SA, as shown in the middle of Figure 5, problems can arise.It should be noted that, although the V GS of M S2 (SO T = V DD ) is greater than the V GS of M S1 (SO C = V DD − ∆V IN,SA ), I S2 < I S1 .This can occur because there is a mismatch between the M S1 -M S2 pair, specifically since the V th of M S1 is lower than the V th of M S2 [22].Consequently, the SO T (initially V DD ) falls more quickly than the SO C (initially V DD − ∆V IN,SA ).Therefore, SO T and SO C end up with 0 V and V DD , respectively, meaning that sensing fails in attempting to sense datum "1".
Sensors 2024, 24, x FOR PEER REVIEW 5 of 24 to positive feedback, formed by MS1-MS2-MS3-MS4.As a result, SOT and SOC eventually reach VDD and 0 V, respectively, as shown on the right side of Figure 4, indicating a successful "1" datum sensing process.However, it is not always guaranteed that the SA operation is stably performed.In Figure 5, there is a scenario where sensing failure occurs.The access phase is the same as the previous normal sensing operation (the left side of Figure 5).However, when the evaluation starts by triggering the SA, as shown in the middle of Figure 5, problems can arise.It should be noted that, although the VGS of MS2 (SOT = VDD) is greater than the VGS of MS1 (SOC = VDD − ΔVIN,SA), IS2 < IS1.This can occur because there is a mismatch between the MS1-MS2 pair, specifically since the Vth of MS1 is lower than the Vth of MS2 [22].Consequently, the SOT (initially VDD) falls more quickly than the SOC (initially VDD − ΔVIN,SA).Therefore, SOT and SOC end up with 0 V and VDD, respectively, meaning that sensing fails in attempting to sense datum "1".Here, the key point is that the mismatch between the paired transistors is responsible for the sensing failure.To prevent this sensing failure, ΔVIN,SA should be large enough to compensate the effects of the transistor mismatch.This minimum required ΔVIN,SA for stable sensing is the offset voltage in the SA, referred to as VOS, and necessitates that ΔVIN,SA > VOS.This VOS problem becomes severed in low-VDD regions and is significantly affected by temperature [49,50].To meet this condition, the WL pulse width is extended to achieve a sufficiently large ΔVBL, which, in turn, results in a large ΔVIN,SA.However, this increased ΔVBL requirement not only causes delays but also raises power consumption, since more power is needed to pre-charge the significant capacitance of the BL pair, stemming from the combined effects of the long wire capacitance of the BL wire and the parasitic capacitance of the bit-cells.
Although employing large-sized transistors for sensing schemes can mitigate the mismatch problem, it incurs power, speed, and area overhead in the sensing stage [18].In addition, the various replica bit-line delay or self-timed SAE generation techniques are proposed to minimize WL pulses [51][52][53][54][55][56][57][58], but their effects are limited because local variations cannot be considered.The speed and power issue due to the ΔVBL requirement in SRAM becomes more severe in today's advanced sub-nanometer technology nodes, because WL-suppressed assist circuits are widely used, which necessitates larger WL pulses for ΔVBL requirements [59][60][61][62].
Therefore, it would be highly beneficial to reduce the VOS, as it would alleviate the demand for a large ΔVBL.In the following section, we describe SRAM sensing circuits designed to reduce the VOS for the purpose of improving speed and power efficiency.We will explore these circuits in terms of their structure, operation, and key performance characteristics.Here, the key point is that the mismatch between the paired transistors is responsible for the sensing failure.To prevent this sensing failure, ∆V IN,SA should be large enough to compensate the effects of the transistor mismatch.This minimum required ∆V IN,SA for stable sensing is the offset voltage in the SA, referred to as V OS , and necessitates that ∆V IN,SA > V OS .This V OS problem becomes severed in low-V DD regions and is significantly affected by temperature [49,50].To meet this condition, the WL pulse width is extended to achieve a sufficiently large ∆V BL , which, in turn, results in a large ∆V IN,SA .However, this increased ∆V BL requirement not only causes delays but also raises power consumption, since more power is needed to pre-charge the significant capacitance of the BL pair, stemming from the combined effects of the long wire capacitance of the BL wire and the parasitic capacitance of the bit-cells.
Although employing large-sized transistors for sensing schemes can mitigate the mismatch problem, it incurs power, speed, and area overhead in the sensing stage [18].In addition, the various replica bit-line delay or self-timed SAE generation techniques are proposed to minimize WL pulses [51][52][53][54][55][56][57][58], but their effects are limited because local variations cannot be considered.The speed and power issue due to the ∆V BL requirement in SRAM becomes more severe in today's advanced sub-nanometer technology nodes, because WL-suppressed assist circuits are widely used, which necessitates larger WL pulses for ∆V BL requirements [59][60][61][62].
Therefore, it would be highly beneficial to reduce the V OS , as it would alleviate the demand for a large ∆V BL .In the following section, we describe SRAM sensing circuits designed to reduce the V OS for the purpose of improving speed and power efficiency.We will explore these circuits in terms of their structure, operation, and key performance characteristics.

Schmitt Trigger Sense Amplifiers
Schmitt triggers are often used to improve the robustness of a standard inverter by modifying the switching threshold.Utilizing this feature, the authors in [24][25][26] proposed the Schmitt trigger-based SA (STSA) to reduce V OS , where one example structure is shown in Figure 6a.This structure intends to weaken the pull-down network of the inverter holding high voltages relative to that of the low-voltage inverter.

Schmitt Trigger Sense Amplifiers
Schmitt triggers are often used to improve the robustness of a standard inverter by modifying the switching threshold.Utilizing this feature, the authors in [24][25][26] proposed the Schmitt trigger-based SA (STSA) to reduce VOS, where one example structure is shown in Figure 6a.This structure intends to weaken the pull-down network of the inverter holding high voltages relative to that of the low-voltage inverter.For example, when SLT is VDD while SLC is VDD − ΔVIN,SA for datum "1" sensing, SOT and SOC become VDD and VDD − ΔVIN,SA, respectively, at the end of the access phase.When the evaluation phase starts with SAE rising, MS5 is more strongly turned on than MS6 because SOT > SOC.Thus, the ZT node (the source of MS3) is more strongly pulled up than ZC (the source of MS4).In this manner, which adjusts not only the gate voltage but also controls the source voltages of MS3 and MS4 according to SOT and SOC, the VGS of MS3 is greatly suppressed.That is, the VGS difference in two paired nFETs (MS3-MS4) in the STSA is larger than that in MS1-MS2 in the VLSA, which makes it more tolerant to the mismatch effects.In this manner, the STSA attempts to provide a reduced VOS compared to the VLSA.
However, the STSA has a limited ability to reduce the VOS.This is because there are additional transistor pairs existing in the STSA; thus, the mismatch effect can be larger.In particular, the mismatch between MS5 and MS6 and the mismatch between MS1 and MS2, which are not present in the VLSA, increase the asymmetricity in the SA and increase the VOS.However, the circuit technique implemented in the STSA, performed by MS1, MS2, MS5, and MS6, effectively mitigates these mismatch effects, thereby compensating for the increase caused by the additional transistor pair.As a result, the final VOS is reduced compared to the VLSA.Furthermore, the sensing delay is increased compared to the VLSA due to the use of a stacked nFET structure [26].
To mitigate the speed problem of STSAs, the voltage-boosted STSAs (VBSTSAs) are proposed [27], as shown in Figure 6b.In VBSTSAs, the negative voltage generator (NVG) used for the negative bit-line write-assist circuit is reutilized to accelerate the operation of STSAs.In the NVG, as the NVG operation starts, the BSTEN increases and the BSTENb decreases.Through the decreased BSTENb, MS13, which was holding OUT to VSS, is turned off, allowing OUT to reach a floating state.Subsequently, after MS13 is completely turned off, BSTENd, delayed through inverters, decreases and OUT is lowered to a negative voltage through a coupling capacitor, C. Note that BSTENd should decrease after the MS13 is fully turned off.Therefore, sufficient delay should be provided by the inverter in the NVG.Specifically, the ground voltage for the SA is pulled down to the negative voltage at the For example, when SL T is V DD while SL C is V DD − ∆V IN,SA for datum "1" sensing, SO T and SO C become V DD and V DD − ∆V IN,SA , respectively, at the end of the access phase.When the evaluation phase starts with SAE rising, M S5 is more strongly turned on than M S6 because SO T > SO C .Thus, the Z T node (the source of M S3 ) is more strongly pulled up than Z C (the source of M S4 ).In this manner, which adjusts not only the gate voltage but also controls the source voltages of M S3 and M S4 according to SO T and SO C , the V GS of M S3 is greatly suppressed.That is, the V GS difference in two paired nFETs (M S3 -M S4 ) in the STSA is larger than that in M S1 -M S2 in the VLSA, which makes it more tolerant to the mismatch effects.In this manner, the STSA attempts to provide a reduced V OS compared to the VLSA.
However, the STSA has a limited ability to reduce the V OS .This is because there are additional transistor pairs existing in the STSA; thus, the mismatch effect can be larger.In particular, the mismatch between M S5 and M S6 and the mismatch between M S1 and M S2 , which are not present in the VLSA, increase the asymmetricity in the SA and increase the V OS .However, the circuit technique implemented in the STSA, performed by M S1 , M S2 , M S5 , and M S6 , effectively mitigates these mismatch effects, thereby compensating for the increase caused by the additional transistor pair.As a result, the final V OS is reduced compared to the VLSA.Furthermore, the sensing delay is increased compared to the VLSA due to the use of a stacked nFET structure [26].
To mitigate the speed problem of STSAs, the voltage-boosted STSAs (VBSTSAs) are proposed [27], as shown in Figure 6b.In VBSTSAs, the negative voltage generator (NVG) used for the negative bit-line write-assist circuit is reutilized to accelerate the operation of STSAs.In the NVG, as the NVG operation starts, the BSTEN increases and the BSTENb decreases.Through the decreased BSTENb, M S13 , which was holding OUT to V SS , is turned off, allowing OUT to reach a floating state.Subsequently, after M S13 is completely turned off, BSTENd, delayed through inverters, decreases and OUT is lowered to a negative voltage through a coupling capacitor, C. Note that BSTENd should decrease after the M S13 is fully turned off.Therefore, sufficient delay should be provided by the inverter in the NVG.Specifically, the ground voltage for the SA is pulled down to the negative voltage at the rising edge of the SAE, or 0 V otherwise.This is realized by making the switch, which is turned on only when the SAE is high, delivering the negative voltage generated by the NVG.Although sensing speed can be enhanced in this manner, it incurs a significant amount of power overhead.In addition, NVGs are not always used for write-assist circuits; other types of write-assist circuit, such as cell voltage collapse write assist, do not use NVGs.

Hybrid Latch-Type Sense Amplifiers
Some previously proposed SAs combine the features of VLSAs and CLSAs to reduce the V OS , which can be referred to as hybrid latch-type SAs (HYSA) [28][29][30][31][32][33]. Figure 7a shows one example of an HYSA proposed in [32], the variation-tolerant SA (VTSA).For consistency in explanation with other structures, the polarity in this VTSA example is reversed from the original structure.The VTSA is primarily based on the CLSA structure but also incorporates features of a VLSA.Specifically, the SA outputs, SO T and SO C , are pre-charged to the SA inputs, SL T and SL C , using pass transistors M S7 and M S8 .
Sensors 2024, 24, x FOR PEER REVIEW 7 of 24 rising edge of the SAE, or 0 V otherwise.This is realized by making the switch, which is turned on only when the SAE is high, delivering the negative voltage generated by the NVG.Although sensing speed can be enhanced in this manner, it incurs a significant amount of power overhead.In addition, NVGs are not always used for write-assist circuits; other types of write-assist circuit, such as cell voltage collapse write assist, do not use NVGs.

Hybrid Latch-Type Sense Amplifiers
Some previously proposed SAs combine the features of VLSAs and CLSAs to reduce the VOS, which can be referred to as hybrid latch-type SAs (HYSA) [28][29][30][31][32][33]. Figure 7a shows one example of an HYSA proposed in [32], the variation-tolerant SA (VTSA).For consistency in explanation with other structures, the polarity in this VTSA example is reversed from the original structure.The VTSA is primarily based on the CLSA structure but also incorporates features of a VLSA.Specifically, the SA outputs, SOT and SOC, are pre-charged to the SA inputs, SLT and SLC, using pass transistors MS7 and MS8. in [32] and (b) hybrid latch-type SA-QZ (HYSA-QZ) in [33].
When comparing VTSAs with VLSAs, a notable difference is observed in the pulldown networks of the positive feedback configurations in the SA.In the VTSA, these networks, consisting of MS3 and MS4, are not directly connected to the CM node as in the VLSA.Instead, they are connected to ZT and ZC nodes, as shown in Figure 7a.These nodes are pulled down by MS1 and MS2, respectively, with their gates controlled by SLC and SLT.This configuration effectively adjusts the VGS of MS3 and MS4 for proper sensing.
The detailed operation of the VTSA is as follows: During the access phase, when SAE = 0 and datum "1" is being sensed, the SLT is at VDD, and SLC is at VDD − ΔVIN,SA, making SOT and SOC pre-charged to VDD and VDD − ΔVIN,SA, respectively, through MS7 and MS8, similar to the VLSA.Additionally, the gate voltages of MS1 and MS2, VG,MS1 and VG,MS2, become VDD − ΔVIN,SA and VDD, respectively.When the evaluation phase begins with SAE = 1, ZT and ZC are pulled down by MS1 and MS2, respectively.In this configuration, since SLT > SLC, MS1 can drive more current than MS2, resulting in ZC being pulled down more strongly than ZT (i.e., ZT > ZC).As a result, compared to the VLSA, the difference between VGS,MS3 and VGS,MS4 is lager in the VTSA, indicating that the amplification can be more stabilized, and thus, VOS can be reduced.This is due to adjustments made not only in the gate voltage conditions of MS3 and MS4 (VG,MS3 < VG,MS4), but also in their source voltage conditions (VS,MS3 > VS,MS4).
However, the VTSA has an additional pair of nFET transistors compared to the VLSA-MS1 and MS2-involved in the initial amplification of signals.This additional pair not only incurs area overhead but also potentially increases the mismatch effects.That is, the mismatch between MS1 and MS2, which does not need to be considered in VLSAs, can  [32] and (b) hybrid latch-type SA-QZ (HYSA-QZ) in [33].
When comparing VTSAs with VLSAs, a notable difference is observed in the pulldown networks of the positive feedback configurations in the SA.In the VTSA, these networks, consisting of M S3 and M S4 , are not directly connected to the CM node as in the VLSA.Instead, they are connected to Z T and Z C nodes, as shown in Figure 7a.These nodes are pulled down by M S1 and M S2 , respectively, with their gates controlled by SL C and SL T .This configuration effectively adjusts the V GS of M S3 and M S4 for proper sensing.
The detailed operation of the VTSA is as follows: During the access phase, when SAE = 0 and datum "1" is being sensed, the SL T is at V DD , and SL C is at V DD − ∆V IN,SA , making SO T and SO C pre-charged to V DD and V DD − ∆V IN,SA , respectively, through M S7 and M S8 , similar to the VLSA.Additionally, the gate voltages of M S1 and M S2 , V G,MS1 and V G,MS2 , become V DD − ∆V IN,SA and V DD , respectively.When the evaluation phase begins with SAE = 1, Z T and Z C are pulled down by M S1 and M S2 , respectively.In this configuration, since SL T > SL C , M S1 can drive more current than M S2 , resulting in Z C being pulled down more strongly than Z T (i.e., Z T > Z C ).As a result, compared to the VLSA, the difference between V GS,MS3 and V GS,MS4 is lager in the VTSA, indicating that the amplification can be more stabilized, and thus, V OS can be reduced.This is due to adjustments made not only in the gate voltage conditions of M S3 and M S4 (V G,MS3 < V G,MS4 ), but also in their source voltage conditions (V S,MS3 > V S,MS4 ).
However, the VTSA has an additional pair of nFET transistors compared to the VLSA-M S1 and M S2 -involved in the initial amplification of signals.This additional pair not only incurs area overhead but also potentially increases the mismatch effects.That is, the mismatch between M S1 and M S2 , which does not need to be considered in VLSAs, can result in unintentional changes in Z T and Z C and degrade the sensing stability.In addition, stacked nFETs degrade the sensing delay and power consumption, like STSAs. Figure 7b shows another example of an HYSA, the HYSA-QZ, which is proposed in [33].This structure more aggressively pre-charges the internal nodes of the SA than the VTSA.The notation of QZ here means that not only output nodes (Q), but the internal nodes between the M S1 -M S2 pair and M S3 -M S4 pair (Z) are also pre-charged to SA inputs in a direction for precise sensing.As shown in Figure 7b, not only SO T and SO C are pre-charged to SL T and SL C , but also Z T and Z C are pre-charged to SL T and SL C , respectively.In this manner, the bias condition of the SA becomes more favorable for accurate sensing than the VTSA.

Capacitor-Based Offset-Compensated SAs
Several previously proposed SAs have addressed transistor mismatches by employing capacitors [34][35][36][37][38][39][40].These capacitors capture the mismatches between paired transistors, and the stored mismatch information is subsequently utilized to bias the internal nodes of the SA for compensation.Figure 8a illustrates the configuration of a capacitor-based threshold-matching SA (TMSA), as presented in [38].
Sensors 2024, 24, x FOR PEER REVIEW 8 of 24 result in unintentional changes in ZT and ZC and degrade the sensing stability.In addition, stacked nFETs degrade the sensing delay and power consumption, like STSAs. Figure 7b shows another example of an HYSA, the HYSA-QZ, which is proposed in [33].This structure more aggressively pre-charges the internal nodes of the SA than the VTSA.The notation of QZ here means that not only output nodes (Q), but the internal nodes between the MS1-MS2 pair and MS3-MS4 pair (Z) are also pre-charged to SA inputs in a direction for precise sensing.As shown in Figure 7b, not only SOT and SOC are precharged to SLT and SLC, but also ZT and ZC are pre-charged to SLT and SLC, respectively.In this manner, the bias condition of the SA becomes more favorable for accurate sensing than the VTSA.

Capacitor-Based Offset-Compensated SAs
Several previously proposed SAs have addressed transistor mismatches by employing capacitors [34][35][36][37][38][39][40].These capacitors capture the mismatches between paired transistors, and the stored mismatch information is subsequently utilized to bias the internal nodes of the SA for compensation.Figure 8a illustrates the configuration of a capacitorbased threshold-matching SA (TMSA), as presented in [38].As demonstrated in Figure 8b,c, the TMSA comprises two main components: a VLSA part and the capacitor-based threshold-matching part.The primary goal of the TMSA is to compensate the mismatch between the MS1-MS2 pair, which is the most critical pair in a VLSA.This correction is accomplished by initially sampling the Vth of MS1 and MS2-Vth,MS1 and Vth,MS2-during the pre-charge phase.Then, the sampled Vth,MS1 and Vth,MS2 are stored at the source nodes of MS1 and MS2.This ensures that the current through MS1 and MS2 during the amplification operation-IS1 and IS2-are independent to their Vth mismatch.
The detailed operation that achieves this objective is illustrated in Figure 9a-d, in the example of sensing datum "1", with a comprehensive explanation provided as follows.As demonstrated in Figure 8b,c, the TMSA comprises two main components: a VLSA part and the capacitor-based threshold-matching part.The primary goal of the TMSA is to compensate the mismatch between the M S1 -M S2 pair, which is the most critical pair in a VLSA.This correction is accomplished by initially sampling the V th of M S1 and M S2 -V th,MS1 and V th,MS2 -during the pre-charge phase.Then, the sampled V th,MS1 and V th,MS2 are stored at the source nodes of M S1 and M S2 .This ensures that the current through M S1 and M S2 during the amplification operation-I S1 and I S2 -are independent to their V th mismatch.
The detailed operation that achieves this objective is illustrated in Figure 9a-d, in the example of sensing datum "1", with a comprehensive explanation provided as follows.
The noticeable point is that V OV,MS1 and V OV,MS2 , which determine I S1 and I S2 , are independent of V th,MS1 and V th,MS2 , respectively.Thus, even in the presence of a mismatch between V th,MS1 and V th,MS2 , I S1 and I S2 can be stably generated (e.g., I S1 < I S2 for datum "1" sensing as in Figure 9c) at the beginning of the evaluation phase.This renders the TMSA to be notably more robust than the conventional VLSA, leading to a reduced V OS .(4) Latching phase (Figure 9d): After the NRSC becomes low in the evaluation phase, this change in NRSC propagates to make LAT = V DD through a delay buffer, which starts the latching phase.In this phase, CT T and CT C become 0 V, so SO T and SO C can latch the sensing results at the full digital level.This state is kept until the next pre-charge phase.Here, one can see that CT T and CT C are 0 V, and they are to be charged up to V DD − V th,MS1 and V DD − V th,MS2 , respectively, in the next pre-charge phase.
Although the TMSA effectively reduces the V OS by compensating the mismatch between M S1 and M S2 , there are several shortcomings in this structure.First, the structure is still under the effect of a mismatch between capacitors, C 0 and C 1 .The mismatch, however, is typically much smaller than the transistor V th mismatch.Second, the implementation of capacitors and delay buffers in the TMSA results in a significant increase in power consumption and area requirements.In particular, a sufficiently large ∆V is necessary to turn on M S1 and M S2 in the early stage of the amplification stage; it is inevitable to employ large capacitors for C 0 and C 1 .However, by placing the metal-oxide-metal (MOM) capacitors on top of the circuit layout, the area overhead can be avoided [39].Consequently, a significant amount of power is required to charge up the NRSC from 0 V to V DD in the pre-charge phase.
As an alternative approach, the variation-tolerant small-signal SA (VTS-SA) is proposed in [39], specifically addressing mismatches between the two inverters in the SA.This is achieved through the utilization of capacitors at the input acceptance part.The structure of the VTS-SA is shown in Figure 10 below.
The noticeable point is that VOV,MS1 and VOV,MS2, which determine IS1 and IS2, are independent of Vth,MS1 and Vth,MS2, respectively.Thus, even in the presence of a mismatch between Vth,MS1 and Vth,MS2, IS1 and IS2 can be stably generated (e.g., IS1 < IS2 for datum "1" sensing as in Figure 9c) at the beginning of the evaluation phase.This renders the TMSA to be notably more robust than the conventional VLSA, leading to a reduced VOS.(4) Latching phase (Figure 9d): After the NRSC becomes low in the evaluation phase, this change in NRSC propagates to make LAT = VDD through a delay buffer, which starts the latching phase.In this phase, CTT and CTC become 0 V, so SOT and SOC can latch the sensing results at the full digital level.This state is kept until the next precharge phase.Here, one can see that CTT and CTC are 0 V, and they are to be charged up to VDD − Vth,MS1 and VDD − Vth,MS2, respectively, in the next pre-charge phase.
Although the TMSA effectively reduces the VOS by compensating the mismatch between MS1 and MS2, there are several shortcomings in this structure.First, the structure is still under the effect of a mismatch between capacitors, C0 and C1.The mismatch, however, is typically much smaller than the transistor Vth mismatch.Second, the implementation of capacitors and delay buffers in the TMSA results in a significant increase in power consumption and area requirements.In particular, a sufficiently large ΔV is necessary to turn on MS1 and MS2 in the early stage of the amplification stage; it is inevitable to employ large capacitors for C0 and C1.However, by placing the metal-oxide-metal (MOM) capacitors on top of the circuit layout, the area overhead can be avoided [39].Consequently, a significant amount of power is required to charge up the NRSC from 0 V to VDD in the precharge phase.
As an alternative approach, the variation-tolerant small-signal SA (VTS-SA) is proposed in [39], specifically addressing mismatches between the two inverters in the SA.This is achieved through the utilization of capacitors at the input acceptance part.The structure of the VTS-SA is shown in Figure 10 below.The VTS-SA is based on a VLSA composed of M S1 -M S2 -M S3 -M S4 , while the SA input nodes, SL T and SL C , are accepted through coupling capacitors C C1 and C C2 , respectively.By utilizing capacitors, the VTS-SA can capture and store the trip points of two inverters in SA-INV 1 (M S1 and M S3 ) and INV 2 (M S2 and M S4 ), shown in Figure 10.By biasing the two inverters with their respective trip points, the two inverters become highly sensitive to small voltage input variations.That is, even small input voltage changes can push the inverters to switch their output states.This enhanced voltage gain of the inverters contributes to the improved speed of the SA.Furthermore, trip-point biasing in the VTS-SA serves another crucial purpose: it allows the SA to adapt and account for process variations within the inverters.By individually setting the trip points, the VTS-SA makes each inverter operate primarily in response to input changes, minimizing its dependence on process variations as much as possible.
The detailed operations of the VTS-SA are illustrated in Figure 11a-c, where there are three main operation phases: (1) the trip-point bias phase, (2) the access phase, and (3) the evaluation phase.
Sensors 2024, 24, x FOR PEER REVIEW 11 of 24 inverters with their respective trip points, the two inverters become highly sensitive to small voltage input variations.That is, even small input voltage changes can push the inverters to switch their output states.This enhanced voltage gain of the inverters contributes to the improved speed of the SA.Furthermore, trip-point biasing in the VTS-SA serves another crucial purpose: it allows the SA to adapt and account for process variations within the inverters.By individually setting the trip points, the VTS-SA makes each inverter operate primarily in response to input changes, minimizing its dependence on process variations as much as possible.
The detailed operations of the VTS-SA are illustrated in Figure 11a-c, where there are three main operation phases: (1) the trip-point bias phase, (2) the access phase, and (3) the evaluation phase.Although the VTS-SA tries to reduce the VOS by capturing the mismatch between INV1 and INV2 through trip-point biasing, there are several limitations to this structure.First, the mismatch between MS5-MS6, MS7-MS8, and MS9-MS10 are newly introduced in this structure, which limits VOS reduction.Second, similar to the TMSA, the VTS-SA is still Although the VTS-SA tries to reduce the V OS by capturing the mismatch between INV 1 and INV 2 through trip-point biasing, there are several limitations to this structure.First, the mismatch between M S5 -M S6 , M S7 -M S8 , and M S9 -M S10 are newly introduced in this structure, which limits V OS reduction.Second, similar to the TMSA, the VTS-SA is still affected by mismatches between C C1 and C C2 , although it is less influential than the transistor mismatch.Third, because the input voltage should be transferred through capacitive coupling, not all of the ∆V IN,SA is delivered to the SA.This inefficiency contributes to an increase in effective V OS .Fourth, the trip-point biasing process should be completed before the ∆V IN,SA appears between SL T and SL C .This requirement potentially increases the circuit complexity.In addition, the short current from V DD to V SS is inevitable during the trip-point biasing, resulting in high power consumption.
The current-mode SA with a capacitive offset correction (CSA COC ) structure proposed in [40] utilizes a single capacitor for storing the trip points of inverters, so it is free from capacitor mismatch effects.The schematic of the CSA COC is shown in Figure 12a, and the operation waveforms of its three main control clock signals-the trip-point storage enable, Φ Trs ; the trip-point bias enable, Φ Trb ; and the sense enable, SAE-are illustrated in Figure 12b.affected by mismatches between CC1 and CC2, although it is less influential than the transistor mismatch.Third, because the input voltage should be transferred through capacitive coupling, not all of the ΔVIN,SA is delivered to the SA.This inefficiency contributes to an increase in effective VOS.Fourth, the trip-point biasing process should be completed before the ΔVIN,SA appears between SLT and SLC.This requirement potentially increases the circuit complexity.In addition, the short current from VDD to VSS is inevitable during the trip-point biasing, resulting in high power consumption.
The current-mode SA with a capacitive offset correction (CSACOC) structure proposed in [40] utilizes a single capacitor for storing the trip points of inverters, so it is free from capacitor mismatch effects.The schematic of the CSACOC is shown in Figure 12a, and the operation waveforms of its three main control clock signals-the trip-point storage enable, ΦTrs; the trip-point bias enable, ΦTrb; and the sense enable, SAE-are illustrated in Figure 12b.The key concept of the CSA COC is to store the difference in the trip point voltages of the two inverters, INV 1 and INV 2 , in Figure 12a.The difference in the trip point voltages of the two inverters, ∆V Tr = V Tr1 -V Tr2 , is stored across the single capacitor, C 0 .Then, the two inverters are biased to compensate the trip-point difference, effectively correcting for the mismatch.The operation of CSA COC unfolds in three phases, as illustrated in Figure 13a-c, with explanations for each provided as follows.
The key concept of the CSACOC is to store the difference in the trip point voltages of the two inverters, INV1 and INV2, in Figure 12a.The difference in the trip point voltages of the two inverters, ΔVTr = VTr1-VTr2, is stored across the single capacitor, C0.Then, the two inverters are biased to compensate the trip-point difference, effectively correcting for the mismatch.The operation of CSACOC unfolds in three phases, as illustrated in Figure 13ac, with explanations for each provided as follows.The CSA COC is immune to capacitor mismatch due to use of a single capacitor, unlike the TMSA and VTS-SA.However, compared to the previous SAs in which the voltage between SL T and SL C is transferred to SO T and SO C through fully turned-on pFETs during the access phase, in the CSA COC , the voltage difference between SO T and SOC follows that of SL T and SO C through partially turned-on pFETs (current-based).This leads to voltage loss, effectively increasing the V OS .In addition, there are numerous required switches and a control signal generation logic, which increases the circuit design complexity with power and area overhead.

Offset-Compensated Pre-Amplifiers
Another approach in offset compensation is the use of pre-amplifiers that amplify the bit-line signal preceding the SA stage, as seen in [41][42][43][44].Instead of directly modifying the SA structure, these additional offset-compensating pre-amplifiers are employed in front of the SA.This allows for the required offset compensation while maintaining the original SA structure.One such example is the bit-line pre-charge and pre-amplifying switching pFET circuit (BP 2 SP), with its structure and key operational waveforms depicted in Figure 14a,b.
digital voltage at the SOT and SOC nodes.In addition, during this phase, the bitequalization circuit-transmission gate T1-is activated to equalize BLT and BLC.T ensures that the subsequent pre-charge operation of BLT and BLC can start with b bit-lines having the same low voltage level as the initial condition.This equaliza step is important for maintaining consistency in the subsequent memory operatio The operation principle of BP 2 SP is to use the same pFETs for using pre-charge line and pre-amplify bit-line voltages.Specifically, by pre-charging the bit-line to capt the Vth variation of the pre-amplifying pFETs, these pre-amplifying pFETs can instan turn on in response to bit-line pair voltage development.This allows the amplified volt to be observed at SLT and SLC, reducing the required ΔVBL for stable sensing, leadin improvements in speed and power efficiency.However, to make bit-line pairs to VD Vth, it is necessary to ensure that the bit-line voltages are sufficiently lower than VDD − before pre-charge.This requirement increases the circuit complexity, especially when memory is awakened from power-down mode or standby mode.In addition, after p charging the bit-line pair to VDD − Vth, the bit-lines become floating, making them susc tible to noise.Moreover, the initial VGS condition of pre-amplifier pFETs can significan vary according to the pre-charge period, which means that the overall speed of the r operation is highly affected by the pre-charge time.
In [43], another pre-amplifier circuit for SRAM, the cross-coupled nFET pre-ampli and pre-charge circuit (CCN-PP), is presented.The structure and operational wavefo of the CCN-PP are shown in Figure 15a,b.As depicted in Figure 15b, the CCN-PP oper in four phases.
(1) Pre-charge phase (PBE = 0, PCB = 0): During this phase, the pre-charging boost ena signal (PBE) and PCB are low, so the SA input pre-charge circuit (MS3-MS4-MS5) MS6 are turned on.This maintains VDDSA as VDD, while SLXT and SLXC are p charged to VDD.It should be noted that, unlike the conventional pre-charge operat all the column MUX transistors and bit-line equalization circuits (T1) are turned As a result, SLT, SLX, BLT, and BLC are pre-charged through the CCN-PP.Because CCN-PP is composed of nFETS, there a threshold voltage drop for pre-charging v ages.That is, BLT and BLC are pre-charged to VDD − min(Vth,MS1, Vth,MS2).As shown in Figure 14b, BP 2 SP is operated in three phases, as explained below.
(1) Pre-charge phase (PCB = 0): In this phase, M S13 and M S14 in BP 2 SP are turned on to precharge BL C and BL T , respectively.This pre-charges BL C and BL T to V DD − V th,MS15 and V DD − V th,MS16 , respectively, through a diode connection.It ensures that M S15 and M S16 have V GS = V th , allowing them to turn on immediately, regardless of V th variations, when BL C or BL T is discharged in the subsequent phase.This compensates the V th mismatch between M S15 and M S16 .In the SA side, SL T and SL C are predischarged to 0 V through M S8 and M S9 .(2) Access phase (PCB = 1, WL = 1): During this phase, the data stored in the selected bitline are reflected to the BL T and BL C .In the example shown in Figure 14b, datum "1" is sensed, so the BL T remains close to its pre-charge level, V DD − V th,MS16 , while BL C decreases from V DD − V th,MS15 .Because the BL C is pre-charged at V DD − V th,MS15 , M S15 turns on instantly as soon as the BL C decreases.This causes the BLX T to increase rapidly.Simultaneously, the COLB is lowered to enable the column MUX, resulting in SL T increasing and SL C remaining at 0 V.As shown in Figure 14b, this phase effectively pre-amplifies the voltage difference between BL T and BL C to the voltage difference between SL T and SL C .(3) Evaluation phase (SAE = 1): In this phase, the SAE is raised, meaning /SAE is lowered.
Consequently, the VLSA is enabled to store the final sensing data in the form of a full digital voltage at the SO T and SO C nodes.In addition, during this phase, the bit-line equalization circuit-transmission gate T 1 -is activated to equalize BL T and BL C .This ensures that the subsequent pre-charge operation of BL T and BL C can start with both bit-lines having the same low voltage level as the initial condition.This equalization step is important for maintaining consistency in the subsequent memory operation.
The operation principle of BP 2 SP is to use the same pFETs for using pre-charge bit-line and pre-amplify bit-line voltages.Specifically, by pre-charging the bit-line to capture the V th variation of the pre-amplifying pFETs, these pre-amplifying pFETs can instantly turn on in response to bit-line pair voltage development.This allows the amplified voltage to be observed at SL T and SL C , reducing the required ∆V BL for stable sensing, leading to improvements in speed and power efficiency.However, to make bit-line pairs to V DD − V th , it is necessary to ensure that the bit-line voltages are sufficiently lower than V DD − V th before pre-charge.This requirement increases the circuit complexity, especially when the memory is awakened from power-down mode or standby mode.In addition, after pre-charging the bit-line pair to V DD − V th , the bit-lines become floating, making them susceptible to noise.Moreover, the initial V GS condition of pre-amplifier pFETs can significantly vary according to the pre-charge period, which means that the overall speed of the read operation is highly affected by the pre-charge time.
In [43], another pre-amplifier circuit for SRAM, the cross-coupled nFET pre-amplifier and pre-charge circuit (CCN-PP), is presented.The structure and operational waveforms of the CCN-PP are shown in Figure 15a,b.As depicted in Figure 15b, the CCN-PP operates in four phases.
(2) Access phase 1 (PBE = 1): During this phase, the unselected column MUX transistors are turned off and the PBE is raised.As a result, MS6 is turned off and then the PBEd rises, boosting the VDDSA into VDD + ΔVC through C0 coupling.Thus, the SA inputs, SLXT and SLXC, are also pre-charged to VDD + ΔVC.Accordingly, BLT and BLC can be slightly raised.In this phase, the WL is activated, so BLT and BLC start to be developed according to bit-cell data.(3) Access phase 2 (PBE = 0, PCB = 1): With PCB rising, SLXT and SLXC are affected by the change in BLT and BLC through the CCN-PP.For example, when accessing the datum "1", as shown in Figure 15b, BLC and SLC decrease, leading MS2 to be turned on while MS1 is kept turned off.The turned-on MS2 makes SLXC fall while SLXT is kept high, close to VDD + ΔVC.Due to the positive feedback nature of cross-coupled nFETs, the voltage difference between SLXT and SLXC is larger than that of BLT and BLC, meaning that the bit-line voltage is pre-amplified.(4) Evaluation phase (SAE = 1): High SAEs activate the SA to latch the data at SA outputs, SOT and SOC.In addition, similar to BP 2 SP, the bit-line equalization circuit is activated to provide proper bit-line initial conditions for the subsequent pre-charge phase.Unlike BP 2 SP, the initial VGS of pre-amplifier transistors in the CCN-PP are determined by access phase 1.Thus, the performance is less dependent on the pre-charge period, so a stable speed can be provided with the CCN-PP.However, as in BP 2 SP, the CCN-PP still suffers from floating BLT and BLC during the pre-charge phase.In addition, the CCN-PP cannot compensate the mismatch between MS1 and MS2, which is an inferior point compared to BP 2 SP.In addition, utilizing the VDDSA boosting circuit can incur a significant amount of power and area overhead.
In [44], the offset-cancelled current SA (OCCSA) is proposed.As shown in Figure 16, the OCCSA uses nFET MUX transistors instead of pFET MUX transistors.Here, the nFET MUX (PSA) operates as a common-gate amplifier, so it effectively pre-amplifies the BL.To bias these PSAs properly with offset-compensating features, BLT and BLC, the BL should be pre-charged lower than VDD − Vth,MS1 and VDD − Vth,MS2, respectively.To realize kept high, close to V DD + ∆V C .Due to the positive feedback nature of cross-coupled nFETs, the voltage difference between SLX T and SLX C is larger than that of BL T and BL C , meaning that the bit-line voltage is pre-amplified.(4) Evaluation phase (SAE = 1): High SAEs activate the SA to latch the data at SA outputs, SO T and SO C .In addition, similar to BP 2 SP, the bit-line equalization circuit is activated to provide proper bit-line initial conditions for the subsequent pre-charge phase.
Unlike BP 2 SP, the initial V GS of pre-amplifier transistors in the CCN-PP are determined by access phase 1.Thus, the performance is less dependent on the pre-charge period, so a stable speed can be provided with the CCN-PP.However, as in BP 2 SP, the CCN-PP still suffers from floating BL T and BL C during the pre-charge phase.In addition, the CCN-PP cannot compensate the mismatch between M S1 and M S2 , which is an inferior point compared to BP 2 SP.In addition, utilizing the VDDSA boosting circuit can incur a significant amount of power and area overhead.
In [44], the offset-cancelled current SA (OCCSA) is proposed.As shown in Figure 16, the OCCSA uses nFET MUX transistors instead of pFET MUX transistors.Here, the nFET MUX (PSA) operates as a common-gate amplifier, so it effectively pre-amplifies the BL.To bias these PSAs properly with offset-compensating features, BL T and BL C , the BL should be pre-charged lower than V DD − V th,MS1 and V DD − V th,MS2 , respectively.To realize this, a separate supply voltage, V prebl , is required.However, the incorporation of this new voltage source is highly costly due to its substantial power and area overheads, making the circuit impractical for actual implementation.this, a separate supply voltage, Vprebl, is required.However, the incorporation of this new voltage source is highly costly due to its substantial power and area overheads, making the circuit impractical for actual implementation.

Other Structures
In [45], an SA with inherent offset cancellation (SAOC) is proposed, with its structure shown in Figure 17a.The SAOC utilizes pFETS-MS10 and MS11 in Figure 17a-for input reception, connecting SLT and SLC to the gate node of these pFETs.Before sensing, by driving SLT and SLC low and toggling the PRE from low to high, the |Vthp| of MS10 and MS11 is captured at the output nodes of SA-SOT and SOC, respectively.Subsequently, BLT and BLC are transferred into SLT and SLC by turned-on MUX transistors, while MS10 and MS11 are turned on by the low PRE.This results in the charging of SOT and SOC by MS10 and MS11.In this manner, the SAOC achieves sensing operations, compensating the mismatch between MS10 and MS11.However, it should be noted that the mismatch between the nFET MUX pair (MS6 and MS7) is not compensated, and pulling up SLT and SLC with nFETs based on BLT and BLC occurs losses during transmitting BL voltage differences to ΔVIN,SA.
In [46], the body-biasing technique is used at critical sensing transistors for auto-offset mitigation features.A differential-input body-biased sense amplifier with floating output nodes (DIBBSA-FL) and a differential-input body-biased sense amplifier with predischarge output nodes (DIBBSA-PD) are shown in Figure 17b,c, respectively.The difference between the DIBBSA-FL and the DIBBSA-PD is that the DIBBSA-PD has additional transistors, MS8 and MS9, to predischarge SOT and SOC, while the DIBBSA-FL only equalizes SOT and SOC.The operations of DIBBSA-FL and DIBBSA-PD are as follows.During the sensing operation, the SAEB decreases and MS3 and MS4 turn on.Simultaneously, when BLT is higher than BLC, through the body-bias effect on MS1, MS2, MS3, and MS4, MS1 and MS3 become forward body-biased and MS2 and MS4 become reverse body-biased.There-

Other Structures
In [45], an SA with inherent offset cancellation (SAOC) is proposed, with its structure shown in Figure 17a.The SAOC utilizes pFETS-M S10 and M S11 in Figure 17a-for input reception, connecting SL T and SL C to the gate node of these pFETs.Before sensing, by driving SL T and SL C low and toggling the PRE from low to high, the |V thp | of M S10 and M S11 is captured at the output nodes of SA-SO T and SO C , respectively.Subsequently, BL T and BL C are transferred into SL T and SL C by turned-on MUX transistors, while M S10 and M S11 are turned on by the low PRE.This results in the charging of SO T and SO C by M S10 and M S11 .In this manner, the SAOC achieves sensing operations, compensating the mismatch between M S10 and M S11 .However, it should be noted that the mismatch between the nFET MUX pair (M S6 and M S7 ) is not compensated, and pulling up SL T and SL C with nFETs based on BL T and BL C occurs losses during transmitting BL voltage differences to ∆V IN,SA .
challenging.This is because the voltage variance is highly dependent on the offset m tion activation time and the sizes of the MS6 and MS7 transistors.

Comparison
Table 1 summarizes the comparison among the SRAM sensing circuit designs ered in Section 3.
Unlike the conventional SAs (VLSA and CLSA), the STSA, VTSA, and HYSA drive or pre-charge the internal nodes of the SA in favor of accurate sensing.In this ner, without using additional control signals or employing additional operation ph the offset voltage can be efficiently reduced.In terms of reducing the VOS, the VTSA HYSA-QZ, which directly pre-charge the internal nodes using pass gates connected t and SLC, outperform the STSA.This is because the mismatch effects in the gated controlling the SLT and SLC in the STSA are larger than the mismatch effects in the t mission gates used by the VTSA or HYSA-QZ to transfer SLT and SLC.Compared wit VTSA, the HYSA-QZ can achieve a smaller VOS because more internal nodes are charged than the VTSA.However, the SA delay is increased in the STSA, VTSA HYSA-QZ compared to the VLSA, because of using increased stack numbers.
The TMSA, VTS-SA, and CSACOC directly capture mismatches in SAs, utilizing pacitor(s).In this manner, the VOS can be further reduced compared to the STSA, V and HYSA-QZ.However, this improvement comes at a cost: introducing addit phases or control signals, biasing through short circuit currents, and using large capac increase the SA delay and energy consumption significantly.The trade-off betwee delay/energy and SA delay/energy becomes evident in this context.More precise com sation of SA mismatches can result in a smaller VOS and reduced BL delay and en However, achieving this delicacy requires additional circuit components, which can to increased SA delay and energy consumption.In [46], the body-biasing technique is used at critical sensing transistors for auto-offset mitigation features.A differential-input body-biased sense amplifier with floating output nodes (DIBBSA-FL) and a differential-input body-biased sense amplifier with pre-discharge output nodes (DIBBSA-PD) are shown in Figure 17b,c, respectively.The difference between the DIBBSA-FL and the DIBBSA-PD is that the DIBBSA-PD has additional transistors, M S8 and M S9 , to predischarge SO T and SO C , while the DIBBSA-FL only equalizes SO T and SO C .The operations of DIBBSA-FL and DIBBSA-PD are as follows.During the sensing operation, the SAEB decreases and M S3 and M S4 turn on.Simultaneously, when BL T is higher than BL C , through the body-bias effect on M S1 , M S2 , M S3 , and M S4 , M S1 and M S3 become forward body-biased and M S2 and M S4 become reverse body-biased.Therefore, SO T pulls up much faster than SO C .However, recently, 3D FETs such as the FinFET and GAA FET have become commonly used.In these technologies, the body effect is nearly negligible.Therefore, using the body-bias technique in recent technologies is not suitable.
Figure 17d shows the cancellation based on delay and offset relation (CDOR) structure [47].Before the sensing operation, the mismatch in the SA is captured by the sensing operation, with SL T and SL C equally set to V DD .Because of the mismatch in the SA, SO T and SO C become (1, 0) or (0, 1), connected to the gate of M S15 and M S14 , respectively.When SO T and SO C are (1, 0), this means that the pull-up strength on the SO T side is higher than that on the SO C side.Simultaneously, Q and QB become V DD and V SS , turning off M S6 and M S7 .In the case of (SO T , SO C ) = (1, 0), M S14 turns on and M S15 turns off, lowering the SL T .Due to the decreased SL T , the pull-up strength of the SO C side becomes stronger, which operates as offset mitigation.However, the process of adjusting the voltage is highly challenging.This is because the voltage variance is highly dependent on the offset mitigation activation time and the sizes of the M S6 and M S7 transistors.

Comparison
Table 1 summarizes the comparison among the SRAM sensing circuit designs covered in Section 3. Unlike the conventional SAs (VLSA and CLSA), the STSA, VTSA, and HYSA-QZ drive or pre-charge the internal nodes of the SA in favor of accurate sensing.In this manner, without using additional control signals or employing additional operation phases, the offset voltage can be efficiently reduced.In terms of reducing the V OS , the VTSA and HYSA-QZ, which directly pre-charge the internal nodes using pass gates connected to SL T and SL C , outperform the STSA.This is because the mismatch effects in the gated FETs controlling the SL T and SL C in the STSA are larger than the mismatch effects in the transmission gates used by the VTSA or HYSA-QZ to transfer SL T and SL C .Compared with the VTSA, the HYSA-QZ can achieve a smaller V OS because more internal nodes are pre-charged than the VTSA.However, the SA delay is increased in the STSA, VTSA, and HYSA-QZ compared to the VLSA, because of using increased stack numbers.
The TMSA, VTS-SA, and CSA COC directly capture mismatches in SAs, utilizing a capacitor(s).In this manner, the V OS can be further reduced compared to the STSA, VTSA, and HYSA-QZ.However, this improvement comes at a cost: introducing additional phases or control signals, biasing through short circuit currents, and using large capacitors increase the SA delay and energy consumption significantly.The trade-off between BL delay/energy and SA delay/energy becomes evident in this context.More precise compensation of SA mismatches can result in a smaller V OS and reduced BL delay and energy.However, achieving this delicacy requires additional circuit components, which can lead to increased SA delay and energy consumption.
Pre-charging BL circuits, BP 2 SP and CCN-PP, offer an alternative approach to capturing transistor V th values and reducing BL voltage development.They can be implemented more simply compared to SA mismatch compensation structures because pre-amplifiers have a simpler structure than SAs.However, controlling BL pre-charge levels can be challenging in practice, especially since they should be floating when diode-connection TRs are used for pre-charging.
In addition to the sensing circuit covered in Section 3, there are several other approaches for reducing V BL requirements [44][45][46][47], as shown in the last four rows in Table 1.However, it is worth noting that these methods have specific characteristics that may affect their applicability.In one of these structures, the SAOC is introduced to address the mismatch between two input pFETs at the beginning of the read access to reduce the V OS .However, it is important to note that the mismatches other transistor pairs, which are also critical for V OS , are not able to be compensated.Thus, it may have increased the V OS even compared to the conventional SAs.In addition, short-circuit current paths are inevitably formed, which limits its practical applicability.
The OCCSA utilizes the MUX transistors as the common gate amplifier to pre-amplify the V BL .Although it is powerful, to operate the MUX as an amplifier, an additional highvoltage source is required for bit-line pre-charge (V prebl ), which significantly incurs power and area overheads.In addition, to compensate the mismatch between the MUX transistor pair, a significant amount of time is required for the separate bit-line pre-charge phase before the access phase, which substantially degrades the cycle time.
The DIBBSA-FL and DIBBSA-PD are proposed.In these structures, differential bit-line inputs are transferred to differential output nodes through pull-up pFETs, while the body of the output pull-up pFETs are biased with bit-lines to enhance sensing accuracy.However, a critical limitation of these approaches arises from the fact that most recent SRAMs utilize multiple gate FETs, such as finFETs and gate-all-around FETs, which exhibit minimal body effects.Consequently, the current or threshold voltage remains nearly independent of body voltage changes, rendering these structures inapplicable.
The CDOR-based offset compensating sensing circuit is introduced.This structure captures the mismatch in SAs during the pre-charge phase of the SRAM.This is achieved by enabling the SA (SAE = 1) with the condition of SL T = SL C = V DD .In this manner, the mismatch information is stored at the differential output nodes, SO T and SO C .For example, if the mismatch favors the SA to make the SO T become low, this mismatch capturing process makes SO T become 0, while SO C becomes high during the pre-charge phase.Then, utilizing this stored information, when the sensing phase starts, SL T and SL C are calibrated to compensate the mismatch.Although the compensation technique is innovative, the accuracy of this compensation technique is highly dependent on factors such as the width of the calibration timing and the sizing of the calibration transistor.This dependency can potentially result in an increase in the effective V OS of the SA, which may render the structure less practical.
Figure 18 shows the minimum operating voltage of SAs according to technology scalability.The minimum operating voltage represents the minimum voltage that satisfies the 6σ sensing yield at the operating frequency of 1 GHz in the 7 nm, 14 nm, and 28 nm processes.potentially result in an increase in the effective VOS of the SA, which may render the structure less practical.
Figure 18 shows the minimum operating voltage of SAs according to technology scalability.The minimum operating voltage represents the minimum voltage that satisfies the 6σ sensing yield at the operating frequency of 1 GHz in the 7 nm, 14 nm, and 28 nm processes.A quantitative comparison among the different SAs covered in Section 3 is shown in Table 2.It is simulated in TSMC 28 nm technology when a four-to-one MUX is used, with VDD = 1.0 V, and the number of bit-cells per column is 256.The distribution of VOS in the SAs is estimated as follows [63]: First, we assume that VOS follows the Gaussian distribution.Thus, PFailSA, the probability of sensing failure, can be expressed as follows: A quantitative comparison among the different SAs covered in Section 3 is shown in Table 2.It is simulated in TSMC 28 nm technology when a four-to-one MUX is used, with V DD = 1.0 V, and the number of bit-cells per column is 256.The distribution of V OS in the SAs is estimated as follows [63]: First, we assume that V OS follows the Gaussian distribution.Thus, P FailSA , the probability of sensing failure, can be expressed as follows: in ( 1), ∆V IN,SA is the SA input voltage difference, µ OS is the mean V OS , σ OS is the standard deviation of the V OS , and Z is the standard Gaussian random variable.Second, representing the standard Gaussian cumulative distribution function (CDF) as Φ(z), Equation ( 1) can be shown as follows: Third, through the inverse function, (2) can be expressed as follows:

Figure 1 .
Figure 1.Simplified schematic of the conventional SRAM for the read operation.Figure 1. Simplified schematic of the conventional SRAM for the read operation.

Figure 1 .
Figure 1.Simplified schematic of the conventional SRAM for the read operation.Figure 1. Simplified schematic of the conventional SRAM for the read operation.

Figure 2 .
Figure 2. Schematic of two commonly used SAs in SRAM: (a) voltage-type latch SA (VLSA) and (b) current-type latch SA.

Figure 2 .
Figure 2. Schematic of two commonly used SAs in SRAM: (a) voltage-type latch SA (VLSA) and (b) current-type latch SA.

Figure 3 .
Figure 3. Operational waveforms for the read operation relevant signals in the conventional SRAM.

Figure 3 .
Figure 3. Operational waveforms for the read operation relevant signals in the conventional SRAM.

Sensors 2024 , 24 Figure 3 .
Figure 3. Operational waveforms for the read operation relevant signals in the conventional SRAM.

Figure 4 .
At the beginning of the evaluation phase, the V GS of M S2 (SO T = V DD ) is greater than that of M S1 (SO C = V DD − ∆V IN,SA ).Consequently, I S2 > I S1 makes SO C fall faster than SO T .This leads to positive feedback, formed by M S1 -M S2 -M S3 -M S4 .As a result, SO T and Sensors 2024, 24, 16 5 of 24

Figure 9 .
Figure 9. Four-step operation of TMSA: (a) pre-charge phase, (b) access phase, (c) evaluation phase, and (d) latching phase.(1)Pre-charge phase (Figure9a): During this phase, the input and output nodes of the SA-SLT, SLC, SOT, and SOC-are pre-charged to VDD.Then, the top-plate nodes of C0 and C1-CTT and CTC-are pre-charged to VDD − Vth,MS1 and VDD − Vth,MS2, respectively, and MS1 and MS2 become turned off.This pre-charge is conducted under the assumption that CTT and CTC are initially at 0 V before pre-charging (the rationale for this will be explained).In addition, the common bottom-plate node for C0 and C1, NRSC, is pre-charged to VDD by MS8, which is turned on by PCB = 0. (2) Access phase (Figure 9b): In this phase, SLC is lowered and becomes VDD − ΔVIN,SA by the bit-cell, causing the SOC to also be VDD − ΔVIN,SA.In addition, the PCB becomes high, so the common bottom-plate node of C0 and C1, NRSC, becomes float-high.(3) Evaluation phase (Figure 9c): This phase starts with the SAE rising, turning on MS7, so the NRSC is pulled down.This results in negative capacitive voltage couplings from NRSC to CTT and CTC, through C0 and C1, respectively.Thus, CTT and CTC are decreased by ΔV, meaning that CTT and CTC are changed into VDD − Vth,MS1 − ΔV and VDD − Vth,MS2 − ΔV, respectively.These turn on MS1 and MS2, where the overdrive voltage (VOV = VGS − Vth) of MS1 and MS2-VOV,MS1 and VOV,MS2-become as follows:

Figure 9 .
Figure 9. Four-step operation of TMSA: (a) pre-charge phase, (b) access phase, (c) evaluation phase, and (d) latching phase.(1)Pre-charge phase (Figure9a): During this phase, the input and output nodes of the SA-SL T , SL C , SO T , and SO C -are pre-charged to V DD .Then, the top-plate nodes of C 0 and C 1 -CT T and CT C -are pre-charged to V DD − V th,MS1 and V DD − V th,MS2 , respectively, and M S1 and M S2 become turned off.This pre-charge is conducted under the assumption that CT T and CT C are initially at 0 V before pre-charging (the rationale for this will be explained).In addition, the common bottom-plate node for C 0 and C 1 , NRSC, is pre-charged to V DD by M S8 , which is turned on by PCB = 0. (2) Access phase (Figure 9b): In this phase, SL C is lowered and becomes V DD − ∆V IN,SA by the bit-cell, causing the SO C to also be V DD − ∆V IN,SA .In addition, the PCB becomes high, so the common bottom-plate node of C 0 and C 1 , NRSC, becomes float-high.(3) Evaluation phase (Figure 9c): This phase starts with the SAE rising, turning on M S7 , so the NRSC is pulled down.This results in negative capacitive voltage couplings from NRSC to CT T and CT C , through C 0 and C 1 , respectively.Thus, CT T and CT C are decreased by ∆V, meaning that CT T and CT C are changed into V DD − V th,MS1 − ∆V and V DD − V th,MS2 − ∆V, respectively.These turn on M S1 and M S2 , where the

Figure 10 .
Figure 10.Structure of VTS-SA.The VTS-SA is based on a VLSA composed of MS1-MS2-MS3-MS4, while the SA input nodes, SLT and SLC, are accepted through coupling capacitors CC1 and CC2, respectively.By utilizing capacitors, the VTS-SA can capture and store the trip points of two inverters in SA-INV1 (MS1 and MS3) and INV2 (MS2 and MS4), shown in Figure 10.By biasing the two

Figure 11 .( 1 )
Figure 11.Three operation phases of VTS-SA: (a) trip-point bias, (b) access phase, and (c) evaluation phase.(1) Trip-point bias phase (Figure 11a): In this phase, the input and output are shorted in INV1 and INV2 of the SA.As a result, the input and output of INV1 and INV2 are set to their respective trip points-Vbias,INV1 and Vbias,INV2.This is accomplished by turning on the MS7 and MS8 transistors through PRE = 1, while also turning on the header and footer switches MS11 and MS12 with EN = 1.In addition, SAE = 0 in this phase, to make the bottom plate of the coupling capacitors, SLIT and SLIC, also be equal to the trip points of the inverters.(2) Access phase (Figure 11b): In this phase, the input-output connections are disconnected, and the two trip-point-biased inverters are ready to accept changes in SLT and SLC through capacitive couplings.Specifically, when sensing datum "1", as demonstrated in Figure 11b, SLC is decreased by ΔVIN,SA.Then, SLIC is decreased by ΔVcoup through capacitive coupling via CC1.Due to trip-point bias, this input change of INV2 leads to a significant change in the output of INV2, SOT.As a result, an amplified voltage difference is observed between SOT and SOC, which is K × ΔVIN,SA, where K > 1.It is important to note that, as previously mentioned, because the inverters are biased to their respective trip point, the output change is almost only determined by the input change, while largely independent to the process variations.(3) Evaluation phase (Figure 11c): In this phase, the SAE becomes high; thus, the two inverters are connected in a cross-coupled fashion, by turning on MS10 and MS9.At the same time, the two cross-coupled inverters are isolated from the input by turning off MS5 and MS6.Through the positive feedback of the cross-coupled inverters, the final data are latched onto SOT and SOC at the full digital level, similar to the operation of other SAs.

Figure 11 .
Figure 11.Three operation phases of VTS-SA: (a) trip-point bias, (b) access phase, and (c) evaluation phase.(1) Trip-point bias phase (Figure 11a): In this phase, the input and output are shorted in INV 1 and INV 2 of the SA.As a result, the input and output of INV 1 and INV 2 are set to their respective trip points-V bias,INV1 and V bias,INV2 .This is accomplished by turning on the M S7 and M S8 transistors through PRE = 1, while also turning on the header and footer switches M S11 and M S12 with EN = 1.In addition, SAE = 0 in this phase, to make the bottom plate of the coupling capacitors, SLI T and SLI C , also be equal to the trip points of the inverters.(2) Access phase (Figure 11b): In this phase, the input-output connections are disconnected, and the two trip-point-biased inverters are ready to accept changes in SL T and SL C through capacitive couplings.Specifically, when sensing datum "1", as demonstrated in Figure 11b, SL C is decreased by ∆V IN,SA .Then, SLI C is decreased by ∆V coup through capacitive coupling via C C1 .Due to trip-point bias, this input change of INV2 leads to a significant change in the output of INV 2 , SO T .As a result, an amplified voltage difference is observed between SO T and SO C , which is K × ∆V IN,SA , where K > 1.It is important to note that, as previously mentioned, because the inverters are biased to their respective trip point, the output change is almost only determined by the input change, while largely independent to the process variations.(3) Evaluation phase (Figure 11c): In this phase, the SAE becomes high; thus, the two inverters are connected in a cross-coupled fashion, by turning on M S10 and M S9 .At the same time, the two cross-coupled inverters are isolated from the input by turning off M S5 and M S6 .Through the positive feedback of the cross-coupled inverters, the final data are latched onto SO T and SO C at the full digital level, similar to the operation of other SAs.

Figure 12 .
Figure 12.(a) Schematic of CSA COC and (b) operation waveforms of three control clock signals.

Figure 15 .
Figure 15.(a) Schematic of CCN-PP and (b) its operational waveforms.(1) Pre-charge phase (PBE = 0, PCB = 0): During this phase, the pre-charging boost enable signal (PBE) and PCB are low, so the SA input pre-charge circuit (M S3 -M S4 -M S5 ) and M S6 are turned on.This maintains VDDSA as V DD , while SLX T and SLX C are precharged to V DD .It should be noted that, unlike the conventional pre-charge operation, all the column MUX transistors and bit-line equalization circuits (T 1 ) are turned on.As a result, SL T , SL X , BL T , and BL C are pre-charged through the CCN-PP.Because the CCN-PP is composed of nFETS, there a threshold voltage drop for pre-charging voltages.That is, BL T and BL C are pre-charged to V DD − min(V th,MS1 , V th,MS2 ).(2) Access phase 1 (PBE = 1): During this phase, the unselected column MUX transistors are turned off and the PBE is raised.As a result, M S6 is turned off and then the PBEd rises, boosting the VDDSA into V DD + ∆V C through C 0 coupling.Thus, the SA inputs, SLX T and SLX C , are also pre-charged to V DD + ∆V C .Accordingly, BL T and BL C can be slightly raised.In this phase, the WL is activated, so BL T and BL C start to be developed according to bit-cell data.(3) Access phase 2 (PBE = 0, PCB = 1): With PCB rising, SLX T and SLX C are affected by the change in BL T and BL C through the CCN-PP.For example, when accessing the datum "1", as shown in Figure 15b, BL C and SL C decrease, leading M S2 to be turned on while M S1 is kept turned off.The turned-on M S2 makes SLX C fall while SLX T is

Figure 18 .
Figure 18.Minimum operating voltage of SAs according to technology scalability.

Figure 18 .
Figure 18.Minimum operating voltage of SAs according to technology scalability.
in (3), both P failSA and ∆V IN,SA are values obtainable through simulation.With the specified values for P failSA and ∆V IN,SA , only µ OS and σ OS remain as variables in (3).Thus, with two instances of (3), the two variables, µ OS and σ OS , can be derived.Therefore, due to a 1000-sample Monte Carlo simulation of V INtest1 (∆V IN,SA = 10 mV) and V INtest2

Table 1 .
Comparison of SRAM sensing circuit designs.

Table 2 .
Quantitative comparison of SRAM SAs at V DD = 1.0 V in 28 nm technology.